Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use squeeze instead of gsub in SquishNormalizer #53

Closed
wants to merge 2 commits into from

Conversation

samleb
Copy link
Contributor

@samleb samleb commented Mar 9, 2014

According to these benchmarks, using squeeze is an order of magnitude faster than gsub when dealing with squishing/squeezing spaces on almost all Ruby implementations, except on Rubinius where it doesn't make much of a difference.
Here is a patch that replaces gsub by squeeze in SquishNormalizer.normalize, with all tests green :)

@@ -2,7 +2,7 @@ module AttributeNormalizer
module Normalizers
module SquishNormalizer
def self.normalize(value, options = {})
value.is_a?(String) ? value.strip.gsub(/\s+/, ' ') : value
value.is_a?(String) ? value.strip.squeeze : value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

irb> " hello\t world".strip.gsub /\s+/, ' '
=> "hello world"
irb> " hello\t world".strip.squeeze
=> "helo\t world"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're making an excellent point: the behavior of this normalizer regarding tabs & newlines isn't specified (only tested with normal spaces in specs), nor clearly documented:

  • :squish Will strip leading and trailing whitespace and convert any consecutive spaces to one space each

I was personally using this normalizer on varchar columns (usually represented with a simple <input> field, thus not really facing the tab/newline problem), but as soon as I went using it on text columns (with <textarea>s), I realized it was dropping newlines, a behavior that didn't suit my needs (basic simple_format), and that I personally found surprising, and deceptive according to the documentation.

IMHO, either the documentation & specs have to be updated accordingly (which makes this normalizer almost useless for multilines attributes), or the implementation has to be modified to follow the documentation (which will not be backward compatible).

BTW, maybe another normalizer could be introduced for multiline attributes, to remove all control characters except for newlines:

config.normalizers[:remove_control_characters] = lambda do |value, options|
  value.try(:gsub, /[[:cntrl:]&&[^\n]]/, '')
end

Anyway, I'd be happy to help :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also:

irb> " hello\t world".strip.squeeze
=> "helo\t world"

Note the missing l.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry about that I've totally missed the ' ' argument and thought the missing l was a typo.
I've corrected this in 748e2bd which is included in the PR.
I've also updated the original benchmarks to reflect this change

@mnaberez
Copy link

ActiveSupport provides a String#squish method. Currently AttributeNormalizer has an implementation like that in Rails 3. The ActiveSupport implementation changed in Rails 4. I assume AttributeNormalizer does not just call String#squish to avoid a dependency on ActiveSupport.

I am not in favor of changing this normalizer to use squeeze(' ') because I think it would be surprising for Rails users to have a normalizer called squish that behaves quite differently from String#squish. I do agree that the specs for this normalizer should be improved.

@samleb
Copy link
Contributor Author

samleb commented Mar 13, 2014

Oh, sorry about that, I've never really came across this ActiveSupport String#squish method but I completely agree with you now that it'd be surprising indeed if the SquishNormalizer didn't behave like String#squish does.
Anyway I think that the WhitespaceNormalizer does a perfect job as dealing with repeating spaces (and removing carriage returns too BTW), so I guess that was the behavior I was looking for in actual applications, it's just that as it wasn't in 1.1.0, I thought SquishNormalizer could do the job.
Thanks again and sorry for the disturbance.

@samleb samleb closed this Mar 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants